Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines

نویسندگان

Pieter Ghysels

Thomas J. Ashby

Karl Meerbergen

Wim Vanroose

چکیده

In the Generalized Minimal Residual Method (GMRES), the global all-to-all communication required in each iteration for orthogonalization and normalization of the Krylov base vectors is becoming a performance bottleneck on massively parallel machines. Long latencies, system noise and load imbalance cause these global reductions to become very costly global synchronizations. In this work, we propose the use of non-blocking or asynchronous global reductions to hide these global communication latencies by overlapping them with other communications and calculations. A pipelined variation of GMRES is presented in which the result of a global reduction is only used one or more iterations after the communication phase has started. This way, global synchronization is relaxed and scalability is much improved at the expense of some extra computations. The numerical instabilities that inevitably arise due to the typical monomial basis by powering the matrix are reduced and often annihilated by using Newton or Chebyshev bases instead. We model the performance on massively parallel machines with an analytical model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Communication-Hiding Conjugate Gradient Method with Deep Pipelines

Krylov subspace methods are among the most efficient present-day solvers for large scale linear algebra problems. Nevertheless, classic Krylov subspace method algorithms do not scale well on massively parallel hardware due to the synchronization bottlenecks induced by the computation of dot products throughout the algorithms. Communication-hiding pipelined Krylov subspace methods offer increase...

متن کامل

Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm

Scalability of Krylov subspace methods suffers from costly global synchronization steps that arise in dot-products and norm calculations on parallel machines. In this work, a modified Conjugate Gradient (CG) method is presented that removes the costly global synchronization steps from the standard CG algorithm by only performing a single non-blocking reduction per iteration. This global communi...

متن کامل

A Global Gmres/multi-grid Scheme for an Adaptive Cartesian/quad Grid Flow Solver on Distributed Memory Machines

A global multi-grid/GMRES solution methodology on distributed memory machines is successfully developed in this study. To preserve the effectiveness of the multigrid scheme, the grid partitioning is based on the communication graph of the coarsest grid, so that all levels of the multi-grids are located in the same zone (processor). Each node of the graph is weighted with the total number of the...

متن کامل

The communication-hiding pipelined BiCGstab method for the parallel solution of large unsymmetric linear systems

A High Performance Computing alternative to traditional Krylov subspace methods, pipelined Krylov subspace solvers offer better scalability in the strong scaling limit compared to standard Krylov subspace methods for large and sparse linear systems. The typical synchronization bottleneck is mitigated by overlapping time-consuming global communication phases with local computations in the algori...

متن کامل

Solving the Problem of Scheduling Unrelated Parallel Machines with Limited Access to Jobs

Nowadays, by successful application of on time production concept in other concepts like production management and storage, the need to complete the processing of jobs in their delivery time is considered a key issue in industrial environments. Unrelated parallel machines scheduling is a general mood of classic problems of parallel machines. In some of the applications of unrelated parallel mac...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

SIAM J. Scientific Computing

دوره 35 شماره

صفحات -

تاریخ انتشار 2013

Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines

نویسندگان

چکیده

منابع مشابه

The Communication-Hiding Conjugate Gradient Method with Deep Pipelines

Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm

A Global Gmres/multi-grid Scheme for an Adaptive Cartesian/quad Grid Flow Solver on Distributed Memory Machines

The communication-hiding pipelined BiCGstab method for the parallel solution of large unsymmetric linear systems

Solving the Problem of Scheduling Unrelated Parallel Machines with Limited Access to Jobs

عنوان ژورنال:

اشتراک گذاری